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Abstract 

Pearl and Dechter (1996) claimed that the d-separation criterion for conditional inde- 
pendence in acyclic causal networks also applies to networks of discrete variables that have 
feedback cycles, provided that the variables of the system are uniquely determined by the 
random disturbances. I show by example that this is not true in general. Some condition 
stronger than uniqueness is needed, such as the existence of a causal dynamics guaranteed 
to lead to the unique solution. 

Causal networks (also known as Bayesian networks or belief networks) are a formalism 
for representing the joint distribution of a collection of random variables in terms of the 
conditional distributions for each variable given values for its "parent" variables. The 
structure of the distribution is represented graphically by a network in which nodes represent 
variables and arrows are drawn from parent nodes to child nodes. These arrows typically 
correspond to causal relationships. In the standard formulation, the network is not allowed 
to have directed cycles. 

When a distribution is specified by such a network, the c?-separation criterion allows one 
to determine that one set of random variables. A, is conditionally independent of another 
set of random variables, B, given values for a third set of random variables C. This criterion 
involves only the presence or absence of arrows in the network, not the detailed numerical 
specification of the conditional distributions. See Pearl (1988) for a detailed discussion. 

Pearl and Dechter (1996) have attempted to extend this framework to networks that 
may contain directed cycles, which correspond to feedback relationships among variables. 
When cycles exist, the joint distribution is no longer specified in terms of the product of 
conditional distributions for children given parents, but rather by saying how the values of 
the observable variables, Xi, . . . are determined by the values for a set of unobserved 
random disturbances, ?7i, . . . , [/„, which are assumed to be independent of each other and 
to have specified distributions. For each variable, Xj, an equation is given specifying that it 
is equal to some function of the corresponding C/j and of some set of parent variables from 
among the Xj with j / i. As before, parent-child relationships are represented graphically 
by drawing edges with arrows from parent nodes to child nodes. 

In order to make this scheme well-defined. Pearl and Dechter require that for any values 
of ?7i, . . . , [/„ there is exactly one set of values for Xi, . . . , X„ for which all the equations are 
satisfied. If this uniqueness condition is satisfied, a distribution over C/i, . . . , will define 
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a distribution over Xi, . . . , X„. One can then ask what conditional independence properties 
this distribution might possess. 

According to Theorem 2 of Pearl and Dechter (1996), if the Xi are all discrete, the 
variables A are conditionally independent of the variables B given the variables C if the 
variables C c?-separate the variables A and B. The c?-separation criterion can be expressed 
in terms of the following manipulations of the graph with nodes corresponding to the Xi 
and with arrows from parents to children: 

1) Delete all nodes from the graph except those in A, B, or C and their ancestors. 

2) Connect by an edge every pair of nodes that share a common child. 

3) Remove arrows from all the edges — i.e., replace each directed edge by an undirected 
edge. 

If, in the resulting graph, all paths from a node in ^ to a node in B pass through a node 
in C, then C c?-separates A from B. 

Figure 1 shows an example of a distribution defined in this way, which serves as a 
counterexample to the claim that c?-separation implies conditional independence for any 
network satisfying the uniqueness condition. The variables in this example all take values 
of or 1. The Ui are independent and are equally likely to be or 1. The Xi satisfy the 
equations shown, in which addition and multiplication are done modulo 2 (i.e., in Z2). Note 
that U2, U3, Ue, and do not appear in the equations, and hence play no role in defining 
the distribution for Xi, . . . , X-j} 

The network and the equations clearly have the required syntactic form. To show that 
this is a valid example, it is also necessary to show that the Ui uniquely determine values 
for the Xi. One can easily confirm that for any values of the Ui the following values for the 
Xi will satisfy all the equations: 





= Ui 


X2 


= U4 + U5 


X3 


= U4 + U5 


X4 


= C/4 


X5 


= U5 


Xe 


= 


X7 


= 



To see that this is the only set of values for the Xi that satisfy all the equations, note first 
that X2 + X4 + X5 must be 0, since if it is instead 1, then Xq = X-j + 1 and X7 = Xg, 
which is impossible. Hence Xq = X-j = 0. Since X4 = C/4 and X5 = C/5, we also see that 

1. The example could have been simplified a bit by omitting Ui and Xi as well, but I have kept them in 
order to show that the counterexample does not depend on use of such a degenerate network. One can 
easily make the example less degenerate still by refining the state variables, splitting state into states 
and 0' and state 1 into states 1 and 1' . The currently unused L^, can then be allowed to influence the 
choice between and 0' and between 1 and 1', with this choice having no effect on the other nodes. 
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Figure 1: Graphical structure and equations for the counterexample. All variables take 
values in {0, 1}. The Ui are independent, with equal probabilities for and 1. 
Addition and multiplication is done modulo 2. The Ui and the dotted arrows are 
not formally part of the graph, but are shown for clarity. 



X2 + X4 + X5 = implies that X2 = U4 + U5, from which it follows that X3 = U4 + U5 + U1, 
since Xi = Ui. 

Let us now consider whether or not X4 is conditionally independent of X5 given X2. 
We can see that X4 and X5 are c?-separated by X2, since in step (1) above, we will delete 
Xq and X7 and all the edges connecting to them, leaving no path from X4 to X5. However, 
given any value for X2, the variables X^ and X5 are in fact dependent. As seen above, 
any set of values for the Xi satisfying the equations must be such that X2 + X4 + X5 = 0. 
Hence, if X2 = 0, then it must be that X4 = X5, and if instead X2 = 1, then it must be 
that X4 = X5 + 1. Conditional on a value for X2, we thus see that X4 is determined by 
X^, showing that they are not conditionally independent. 

The problem appears to arise because even though specifying values for all the Ui 
uniquely determines values for all the Xi, specifying a value for Ui alone leaves two sets 
of values for (Xi,X2,X3) that satisfy the equations associated with these variables, even 
though only one set of values will satisfy the entire set of equations. Which values for 
(Xi,X2,X3) are part of the overall solution depends on the value of C/4 + C/5, and this 
induces a dependence between X4 = U4 and X5 = U5 when the value of X2 is known. 
The removal of part of the graph in step (1) of the procedure for determining c?-separation 
eliminates any possibility of accounting for this dependence. 

The problem with Pearl and Dechter's proof appears connected with this. They say. 

At this point we invoke the fact that the constraints of C are not arbitrary but 
are functional, namely, for every values of [ the parents of Xi ] and Ui there is 



89 



Neal 



a solution for Xi. This implies that, for any set W of variables, the equations 
associated with non-ancestors of W do not constrain the permitted values of 
W... 

The example here shows that the equations involving non-ancestors of W can indeed con- 
strain the permitted values for W . 

Judea Pearl (personal communication) has suggested that the c?-separation criterion 
can be salvaged by requiring not only that Ui, . . . ,Un uniquely determine Xi, . . . , X„, but 
also that this unique solution for Xi, . . . , X„ can be obtained by a procedure in which the 
Xi are updated in accordance with the causal structure of the network. In such a casual 
dynamical procedure, each Xi is repeatedly replaced by the value computed for it from the 
corresponding Ui and the current values of its parents, according to the equation for that 
Xj, until a stable state is reached. The flow of information in such a procedure follows the 
direction of the arrows in the network. Consequently, nodes that are not ancestral to any 
node of interest can have no influence on these nodes, justifying their elimination in step 
(1) of the procedure for determining c?-separation. 

In general, whether or not such a dynamical procedure eventually finds the solution for 
Xi, . . . , X„ may depend on whether the Xj are updated simultaneously or sequentially, and 
if they are updated sequentially, on the order of these updates. For present purposes, it is 
sufficient that some updating scheme exist that is guaranteed, for any values of C/i, . . . , ?7„, 
to lead to the unique solution, starting with any initial values for Xi,...,X„. If such 
an update order exists, c?-separation will imply conditional independence. In the example 
above, any updating order will for some initial state lead to cyclic behaviour in which the 
values of Xq and X7 fiip back and forth, so the example does not satisfy this stronger 
condition. 

One should note that the example in this note does not invalidate the result of Spirtes 
(1995) that c?-separation can be used to determine conditional independence in linear net- 
works of normally distributed variables even if they contain cycles. In the same paper, 
Spirtes also gave a counterexample showing that d-separation need not imply conditional 
independence in non-linear networks of continuous random variables that contain cycles. 
This problem cannot be avoided by simply discretizing the continuous variables, as the 
problem reappears in the form of non-existence or non-uniqueness of solutions. This note 
shows that in non-linear networks of discrete variables a stronger condition than uniqueness 
is required for d-separation to be valid. Although such a stronger condition involving causal 
dynamics can be seen as natural, the need to verify this stronger condition does reduce 
the attractiveness of networks with cycles as a way of formalizing causal situations with 
feedback. 
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